Question 1. After exploring the Manhattan plots, and qq-plots from each GWAS, what can you tell about the power of each study?

# source(here::here("networks/Manhattan_plot.R"))
system(paste("RScript",
             here::here("networks/Manhattan_plot.R"), 
             "-p networks/MS.pvals.out",
             "-o networks/ms_manhattan_plot.pdf"
             )
       )

system(paste("RScript",
             here::here("networks/Manhattan_plot.R"), 
             "-p networks/HT.pvals.out",
             "-o networks/ht_manhattan_plot.pdf"
             )
       )

# source(here::here("networks/qqplot.R"))
system(paste("RScript",
             here::here("networks/qqplot.R"), 
             "-p networks/MS.pvals.out",
             "-o networks/ms_qqplot.pdf"
             )
       )

system(paste("RScript",
             here::here("networks/qqplot.R"), 
             "-p networks/HT.pvals.out",
             "-o networks/ht_qqplot.pdf"
             )
       )

MS has more power (seen by more significant findings)




Question 2. Using Cytoscape, analyze the PPI and describe its main network.



Plot: Full Parent PPI Network

Parent PPI full network

Parent PPI full network
Parent PPI full network

The above image shows the full network. After running analysis in cytoscape, the resulting summary information and plots were produced:

Summary Statistics
Number of nodes: 8960
Number of edges: 27724
Avg. number of neighbors: 6.363
Network diameter: 13
Network radius: 7
Characteristic path length: 4.382
Clustering coefficient: 0.088
Network density: 0.001
Network heterogeneity: 2.063
Network centralization: 0.033
Connected components: 164
Analysis time (sec): 34.270

Plot: Parent PPI Network Degree Distribution Plot

Parent Degree Distribution
Parent Degree Distribution

Plot: Parent PPI Betweeness by Degree Distribution

Parent Betweeness by Degree Distribution - show degree distribution - scale free,




Question 3. Using Cytoscape, find the first order networks (p<0.05) for each GWAS.

Plot: First order network for MS GWAS

MS Parent Network
MS Parent Network

Plot: First order network for HT GWAS

HT Parent Network
HT Parent Network




Question 4. Source “Pathway_permutation.r”. Are the first order networks from both GWAS more connected than expected? What does this mean?

#source(here::here("networks/Pathway_permutations.R"))
system(paste("RScript",
             here::here("networks/Pathway_permutation.R"), 
             "-p networks/parent_PPI.sif",
             "-o networks/q4_pathway_permutation.pdf"
             )
       )




Question 5. Run BINGO App on all nodes from largest connected component. What biological processes emerge from the first order networks?

Plot: Network produced by taking the largest connected component of the full parent network and performing BINGO analysis

Largest component parent BINGO
Largest component parent BINGO

Table: Top results from BINGO analysis of largest component of parent network

All big component BINGO results
All big component BINGO results

Plot: Network produced by taking the largest connected component of first order network for MS and performing BINGO analysis

MS BINGO Network
MS BINGO Network

Table: Top results from BINGO analysis of first order network for MS

MS BINGO Top Results
MS BINGO Top Results

Looking at the top GO term enrichments by BINGO analysis (above) immune system related processes clearly emerge as important.




Question 6. Map and color known MS and HT genes onto their respective first order nets. Interpret results.

Plot: MS first order net, known MS genes colored in yellow

MS gene coloured on first order net
MS gene coloured on first order net

Plot: HT first order net, known HT genes colored in yellow

HT genes coloured on first order net
HT genes coloured on first order net




Question 7. Repeat steps 3 and 4 with directed protein network from PNAS paper.

7a. Repeat Step 3: find the first order networks (p<0.05) for each GWAS.

Plot: First order MS network (subset of the directed network)

First order MS network
First order MS network

Plot: First order HT network (subset of the directed network)

First order HT network
First order HT network

7b. Repeat Step 4: Source “Pathway_permutation.r”. Are the first order networks from both GWAS more connected than expected? What does this mean?

#source(here::here("networks/Pathway_permutations.R"))
system(paste("RScript",
             here::here("networks/Pathway_permutation.R"), 
             "-p networks/Directed_PPI.sif",
             "-o networks/q7_pathway_permutation.pdf"
             )
       )




Question 8. Color nodes by controllability category (dispensable, indispensable, neutral).

Plot: Full directed network coloured by controllability: blue (dispensable), red (indispensable), grey (neutral).

Full Directed Network Coloured
Full Directed Network Coloured

Plot: MS first order Network coloured by controllability: blue (dispensable), red (indispensable), grey (neutral).

MS first order network coloured
MS first order network coloured

PLot: HT first order Network coloured by controllability: blue (dispensable), red (indispensable), grey (neutral).

HT first order network coloured


Question 9. Repeat step 6. Are MS-associated genes more enriched in any controllability category? Interpret.

9a. Repeat step 6: Map and color known MS and HT genes onto their respective first order nets. Interpret results.

Plot: MS first order net (from directed network), with known MS genes colored in red

MS first order network, color MS genes
MS first order network, color MS genes

Plot: HT first order network, with known HT genes colored red

HT first order network, color HT genes
HT first order network, color HT genes

9b. Are MS-associated genes more enriched in any controllability category? Interpret.

The null hypothesis to test is: The number of controllable genes in the MS-associated first order network is consistent with random sampling of controllable genes from the full directed network.

# total number of nodes in the MS associated gene first order network
# this is k in the hypergeometric parameters
k = 546

# is the total number of controllable nodes in the network
# i.e. dispensible + indispensible
m = 3677 - 8 # is the number of unlabelled nodes

# n is the total number of nodes in the directed network
n = 6338- m

# then the value to test for is the number of controllable
# ms associated genes 
q = 317

phyper(k= k, 
       lower.tail = F, 
       m = m, 
       n = n, 
       q = q)
## [1] 0.4493194

This p-value suggest that MS-associated genes are not enriched for controllable genes. This is unsurprising given the proportion of controllable genes out of the MS-associated genes (317/549 \(\approx\) 58%) is very similar to the proportion of controllable gene in the entire directed network (3667/6338).